Tuning Text Classification for Hereditary Diseases with Section Weighting

نویسندگان

  • Jörg Hakenberg
  • Juliane Rutsch
  • Ulf Leser
چکیده

Motivation: Information in life science publications is heterogeneously distributed over various sections. Depending on research questions, different sections cover more or less of the data needed to answer them. Our approach, called section weighting, seeks to make use of information coverage and density found in typical life science publications. We study the impact of section weighting on text classification according to hereditary diseases. Results: Our results indicate that weighting sections can improve text classification. Our systems gain 7% in F1-measure when we add section weighting. Proper composition of features is equally crucial, improving our results by 11%. Combining both techniques, the system yields a performance 18% higher than the baseline classifier. For our research question, favoring the sections Abstract, Introduction, and Materials and Methods yields the best results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

A QUADRATIC MARGIN-BASED MODEL FOR WEIGHTING FUZZY CLASSIFICATION RULES INSPIRED BY SUPPORT VECTOR MACHINES

Recently, tuning the weights of the rules in Fuzzy Rule-Base Classification Systems is researched in order to improve the accuracy of classification. In this paper, a margin-based optimization model, inspired by Support Vector Machine classifiers, is proposed to compute these fuzzy rule weights. This approach not only  considers both accuracy and generalization criteria in a single objective fu...

متن کامل

Feature Selection for Natural Language Call Routing Based on Self-Adaptive Genetic Algorithm

The text classification problem for natural language call routing was considered in the paper. Seven different term weighting methods were applied. As dimensionality reduction methods, the feature selection based on self-adaptive GA is considered. k-NN, linear SVM and ANN were used as classification algorithms. The tasks of the research are the following: perform research of text classification...

متن کامل

An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification

Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...

متن کامل

Comparative Study and Analysis of Supervised and Unsupervised Term Weighting Methods on Text Classification

Text Classification is one of the booming area in research with the availability of huge amount of electronic data in the form of news article, research articles, email message, blog, web pages etc. Text Representation is a vital step for text classification. In text representation, term weighting method assigns appropriate weights to the term to get better performance; the term weighting metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994